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Although family history is a risk factor for pancreatic adenocarcinoma, much of the 
genetic etiology of this disease remains unknown. While genome-wide association studies 
have identified some common single nucleotide polymorphisms (SNPs) associated with 
pancreatic cancer risk, these SNPs do not explain all the heritability of this disease. We 
hypothesized that copy number variation (CNVs) in the genome may play a role in genetic 
predisposition to pancreatic adenocarcinoma. Here, we report a genome-wide analysis 
of CNVs in a small hospital-based, European ancestry cohort of pancreatic cancer cases 
and controls. Germline CNV discovery was performed using the lllumina Human CNV370 
platform in 223 pancreatic cancer cases (both sporadic and familial) and 169 controls. 
Following stringent quality control, we asked if global CNV burden was a risk factor for 
pancreatic cancer. Finally, we performed in silico CNV genotyping and association testing 
to discover novel CNV risk loci. When we examined the global CNV burden, we found 
no strong evidence that CNV burden plays a role in pancreatic cancer risk either overall 
or specifically in individuals with a family history of the disease. Similarly, we saw no 
significant evidence that any particular CNV is associated with pancreatic cancer risk. 
Taken together, these data suggest that CNVs do not contribute substantially to the 
genetic etiology of pancreatic cancer, though the results are tempered by small sample 
size and large experimental variability inherent in array-based CNV studies. 



Keywords: pancreatic cancer, copy number variation, cancer risk, SNP microarrays, CNVs 



INTRODUCTION 

Pancreatic adenocarcinoma is the fourth-leading cause of cancer 
mortality in the United States for both men and women (Siegel 
et al, 2012). Despite recent advances in screening methods and 
surgical treatment, it is a rapidly fatal disease with a poor 5-year 
survival rate of 5-6%. Thus, a challenge exists to develop new and 
more effective therapeutic interventions. 

Inherited genetic predisposition to pancreatic cancer is 
hypothesized to play a role in both familial and non-familial 
forms of the disease. In large-scale genome-wide association 
studies, common single-nucleotide polymorphisms (SNPs) on 
chromosomes 9q34, 13q22, lq32, and 5pl5 were associated 
with pancreatic cancer risk (Amundadottir et al., 2009; Petersen 
et al., 2010); however, the true causal variants underlying these 
associations and their functional mechanisms remain unclear. 
Additional studies have focused their analyses on SNPs within 
candidate genes (Jiao et al, 2006, 2008; Li et al., 2007; McWilliams 
et al., 2008). Under this approach, SNPs within DNA dam- 
age response and repair genes — particularly ATM, LIG3, XRCC1, 
and XRCC2 genes — were associated with increased risk, suggest- 
ing the involvement of inherited genetic variants within these 
pathways in pancreatic tumorigenesis. 



Importantly, such efforts to identify inherited genetic variants 
that contribute to pancreatic cancer susceptibility may lead to 
novel biological insights about the disease and useful biomarkers 
for risk prediction. However, while these efforts have primarily 
focused on the analysis of SNPs, the additional contribution of 
germline copy number variations (CNVs) remains unclear. 

CNVs are generally defined as inherited or de novo deletions or 
duplications of the genome ranging in size from 100 bp to 3 Mb 
(Zhang et al, 2009). Such variations may lead to changes in gene 
dosage and/or expression (Diskin et al., 2009). As of August 2012, 
approximately 179,450 human CNVs have been reported in the 
Database of Genomic Variants (DGV) (Iafrate et al., 2004; Zhang 
et al, 2006). Although there are substantially fewer reported 
CNVs than SNPs, it is estimated that more than 30% of the 
human genome is covered by at least one CNV (compared to 
the <1% covered by SNPs). Thus, CNVs are hypothesized to be 
of functional significance. 

The specific role of CNVs in familial forms of pancreatic 
cancer has been investigated previously. Lucito et al. used rep- 
resentational oligonucleotide microarray analysis (ROMA) to 
identify a total of 56 germline CNVs that were present in patients 
with a family history of pancreatic cancer and absent from a 
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cohort of healthy controls (Lucito et al., 2007). Al-Sukhni et al. 
followed a similar approach by analyzing the germline DNA of 
91 familial pancreatic cancer patients and a 950 healthy controls 
using high-resolution Affymetrix 500 K and SNP 6.0 platforms 
(Al-Sukhni et al, 2012). There, investigators found a total of 
93 germline CNVs that were unique to familial pancreatic can- 
cer patients.Taken together, these studies nominate several CNVs 
as putative risk loci for familial pancreatic cancer. However, 
additional studies are needed to confirm these findings. More 
importantly, evidence for the broader role of CNVs outside of the 
familial pancreatic cancer setting is still unclear. 

Here, we report a genome-wide analysis of CNVs in a hospital- 
based, European ancestry cohort of pancreatic cancer cases and 
controls. Germline CNV discovery was performed using the 
Illumina Human CNV370 platform in 223 pancreatic cancer cases 
(both sporadic and familial) and 169 controls. Following strin- 
gent quality control, we explored whether global CNV burden was 
a risk factor for pancreatic cancer. Finally, we performed in silico 
CNV genotyping and association testing to discover novel CNV 
risk loci. 

MATERIALS AND METHODS 

SAMPLE COLLECTION AND SNP ARRAY GENOTYPING 

Participants were part of an ongoing hospital-based case- 
control project conducted in conjunction with the Familial 
Pancreatic Tumor Registry (FPTR) at Memorial Sloan-Kettering 
Cancer Center (MSKCC; New York, NY) as described previously 
(Mukherjee et al., 2011; Willis et al., 2012). Briefly, patients were 
eligible if they were 21 years or older, spoke English, and had 
pathologically or cytologically confirmed adenocarcinoma of the 
pancreas. Patients were recruited between June 2003 and July 2009 
from the surgical and medical oncology clinics at MSKCC at the 
time of their initial diagnosis or during follow-up. Controls were 
spouses of patients or visitors accompanying patients with other 
diseases, had the same age and language eligibility requirements 
as the cases, had no personal history of cancer (except for non- 
melanoma skin cancer), and were not blood relatives of the cases. 
The participation rate among approached and eligible individu- 
als was 76% among cases and 56% among controls. The study 
was approved by the MSKCC Institutional Review Board, and all 
enrolled participants signed informed consent. 

Consented participants provided a blood or buccal (mouth- 
wash or saliva) sample to the MSKCC FPTR research study assis- 
tant and completed risk factor and family history questionnaires 
administered by the research study assistant in person or via tele- 
phone. Biospecimens were subsequently delivered for genomic 
DNA extraction and banking to the Molecular Epidemiology 
Laboratory. DNA was isolated from mouthwash specimens using 
the Puregene DNA Purification Kit (Qiagen, Inc.), from saliva 
samples with the Oragene saliva kits (DNA Genotek), and from 
whole blood using the GentraPuregene Blood Kit (Qiagen Inc.). 
DNA samples were hydrated in 1 x TE buffer. The quality and 
quantity of the DNA was assessed by spectrophotometry and 
agarose gel electrophoresis. 

A total of 464 individuals (263 cases and 201 controls) 
recruited at MSKCC were available for inclusion in downstream 
analyses. DNA samples were genotyped in 28 batches on the 



Illumina Human CNV370 bead array (either the Illumina Human 
CNV370-Duo or Illumina Human CNV370-Quad format) at the 
Genomics Core Laboratory of MSKCC according to the manu- 
facturer's protocol. Normalization and SNP genotype calling was 
performed using the Illumina BeadStudio software package. Ten 
individuals had their DNA analyzed twice for quality control dur- 
ing the course of the genotyping experiments, yielding a total 
of 474 samples. Normalized probe intensities were exported for 
downstream CNV discovery and genotyping. 

SNP genotype calls for 474 individual samples (including 10 
duplicate pairs) were exported to PLINK (version 1.07; Purcell 
et al, 2007) for processing. Identity-by-descent (IBD) analysis 
was performed to confirm that none of the genotyped indi- 
viduals were blood relatives. For each known duplicate sample 
pair, priority was given to the sample that passed CNV-level 
quality-control (described below). We removed SNPs with call 
rates <95%, minor allele frequency <5%, or Hardy-Weinberg 
equilibrium (HWE) testp-value < 1 x 10~ 7 in controls, leaving 
a total of 315,136 SNPs. 

Population structure was examined by principal component 
analysis (PCA) of the SNP genotype calls. As reference, we 
obtained whole-genome SNP data for Utah residents with north- 
ern and western European ancestry (abbreviated CEU) and 
individuals living in Toscani in Italia (abbreviated TSI) from 
the International HapMap project (phase 3, draft release 2) 
(International Hapmap 3 Consortium et al., 2010). The top 
four principal components of genetic structure identified by 
EIGENSOFT were used as covariates in downstream CNV asso- 
ciation testing (Patterson et al, 2006). 

CNV DISCOVERY AND QUALITY CONTROL 

CNV discovery was performed for each MSKCC sample using 
two parallel methods. First, we applied a hidden Markov model 
(HMM)-based algorithm implemented in the PennCNV pack- 
age (2009 Aug 27 release, Wang et al, 2007). PennCNV makes 
use of normalized probe intensity (R) and allelic intensity ratio 
(BAF) values measured across different probes on the bead array 
to detect regions of copy number variation in the sample. For each 
probe, a ratio of the observed R value to the expected R value is 
calculated (here, the expected value is pre-defined as the average 
intensity observed at the locus in a pool of healthy HapMap indi- 
viduals from CEU, YRI, and CHB-JPT populations). Positive or 
negative deviations of the log R 0 bserved/Rexpected ratio (LRR) from 
zero indicate that the locus may be either duplicated or deleted, 
respectively. The algorithm incorporates spatial information as 
well, such that the probability of transitioning between different 
copy number states is dependent upon the physical map distance 
between two adjacent loci. 

PennCNV was applied to 474 individual samples (including 10 
duplicate pairs) using default parameters and GC-wave correction 
(Diskin et al., 2008). From each duplicate pair, one sample was 
kept for downstream analysis on the basis of having the lowest 
LRR value. Quality-control (QC) was then applied at the sample- 
level to PennCNV output by excluding samples on the basis of: 
(1) LRR standard deviation >0.28 (mean LRR plus 3 times the 
interquartile range); (2) BAF standard deviation >0.13 (approxi- 
mately the mean BAF plus three times the interquartile range); or 
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(3) a total number of CNV calls >124 (approximately the mean 
call rate plus 1.5 times the interquartile range). Additional QC was 
applied at the CNV-level by excluding CNV calls <5 kb in length 
and spanning <5 probes. After QC, we derived a total of 11,635 
CNV calls from 417 unique samples using PennCNV. 

Secondly, we applied an Objective Bayes HMM-based algo- 
rithm implemented in QuantiSNP (version 2.3 beta, Colella 
et al., 2007). QuantiSNP was run using default parameters and 
GC-wave correction on 474 individual samples (including 10 
duplicate pairs). One sample from each duplicate pair was kept 
for downstream analysison the basis of having the lowest LRR 
value. QC was then applied at the sample-level by excluding 
samples on the basis of: (1) LRR standard deviation >0.21; 
(mean plus 3 times the interquartile range); (2) BAF standard 
deviation >0.102 (approximately the mean plus three times the 
interquartile range); (3) a total number of CNV calls >160 
(approximately the mean call rate plus two times the interquartile 
range). Additional QC was applied at the CNV-level by exclud- 
ing CNV calls with logBF confidence scores <15. After QC, we 
derived a total of 5422 CNV calls from 414 unique samples using 
QuantiSNP. 

Lastly, as an additional QC procedure, we retained only those 
CNV calls that were made in the same individual by both 
PennCNV and QuantiSNP. Any sample or CNV call that was 
present in just one of the result-sets was excluded. The bound- 
aries of each merged CNV call were defined using the smallest 
starting coordinate and largest end coordinate from either algo- 
rithm. This procedure yielded 3520 merged CNV calls from 392 
unique individuals. 

CNV ANNOTATION 

The start and end coordinates of each CNV in our dataset 
were based on the March 2006 human genome build 
(NCBI36/hgl8). For comparison to previously-reported 
CNV loci, we obtained the 2012-03-29 release of the Database of 
Genomic Variants (DGV). 

DEFINITION OF CNV REGIONS (CNVRs) 

CNVRs were defined as any contiguous segment of the genome 
spanned by a CNV call from any sample. To identify CNVRs, 
we applied an iterative clustering procedure to the QC-filtered 
CNV calls, whereby CNV calls with a mutual overlap of >40% 
were considered to be members of the same CNVR cluster. The 
boundaries of the CNVR clusters were "relaxed," such that the 
starting and ending coordinates were based by the smallest and 
largest coordinates of any CNV that was a member of the cluster, 
respectively. 

INSILICO CNVR GEN0TYPING 

In silico CNVR genotyping was performed using the CNVtools 
package (Barnes et al., 2008). For each CNVR of interest, we 
systematically evaluated the parameter space of data summa- 
rization methods, number of copy-number components, and 
signal/variance model specifications: 

1. Data summarization. As starting input for CNVtools, we 
extracted the normalized signal intensities of probes on the 



array that mapped within the boundaries of a given CNVR of 
interest. The probe intensities for a given region and sample 
were summarized using either one of two methods: princi- 
pal component analysis or simply taking the mean. For each 
CNVR under investigation, we selected the method which gave 
the best separation between different copy-number clusters by 
visual inspection. 



Table 1 | Association of demographic and experimental factors with 
CNV burden. 

Variable Beta P-value 

DNA SOURCE 



Blood 


1 .0 (ref) 




Buccal 


0.72 


0.29 


Saliva 


0.69 


0.38 


Age 


0.02 


0.30 




GENDER 


Female 


1 .0 (ref) 




Male 


0.38 


0.30 


EXPERIMENTAL BATCH 


Batch 1 


1 .0 (ref) 




Batch2 


3.07 


0.02 


Batch3 


2.26 


0.08 


Batch4 


-0.14 


0.91 


Batch5 


1.26 


0.37 


Batch6 


1.41 


0.33 


Batch7 


0.36 


0.81 


Batch8 


0.41 


0.78 


Batch9 


1.26 


0.32 


Batch 10 


1.17 


0.35 


Batch 11 


3.64 


0.005 


Batch 12 


1.14 


0.38 


Batch 13 


2.86 


0.03 


Batch 14 


1.21 


0.35 


Batch 15 


0.46 


0.75 


Batch 16 


4.41 


0.003 


Batch 17 


2.09 


0.11 


Batch 18 


1.71 


0.18 


Batch 19 


0.66 


0.60 


Batch20 


0.63 


0.63 


Batch21 


0.44 


0.74 


Batch22 


2.36 


0.22 


Batch23 


-1.39 


0.47 


Batch24 


-0.14 


0.91 


Batch25 


0.00 


1.00 


Batch26 


3.58 


0.001 


Batch27 


3.24 


0.002 


Batch28 


3.57 


0.02 


PLATFORM 


lllumina CNV370-duo 


1 .0 (ref) 




lllumina CNV370-quad 


2.10 


3.66 x 10" 7 



Differences in CNV burden between different levels of a factor were evaluated 
using linear regression. 
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2. Copy-number components. For each CNVR under investi- 
gation, we used the Bayesian Information Criterion (BIC) 
and our subjective visual assessment of clustering quality to 
select the optimal number of copy-number classes used for 
genotyping. 

3. Signal and variance model selection. For each CNVR under 
investigation, we explored different combinations of linear 
models to describe the signal mean and variance of each 
copy-number class (model.mean and model.var parameters, 
respectively). We considered models for signal mean that were 
"free" (stratified by copy-number class) or proportional to 
copy-number. Similarly, we considered models for signal vari- 
ance that were either free (stratified by copy-number class), 
proportional to copy-number, or constant for each copy- 
number. Selection was based on successful convergence of the 
model fitting procedure and visual assessment of the clustering 
quality. 

Individual CNVR loci were excluded from downstream risk asso- 
ciation testing on the basis of being too problematic for in silico 
CNVR genotyping — e.g., rare or singleton (detected in only one 
sample) events, and noisy or insufficient separation between 
different copy-number clusters. 

STATISTICAL METHODS TO EVALUATE RISK ASSOCIATION 
Comparison of CNV burden 

In this study, "CNV burden" was estimated on a per-individual 
basis by counting the number of CNV calls made in a given 
individual. We compared the estimated CNV burden between 



Table 2 | Characteristics of the CNV discovery samples. 

Cases Controls P-value' 

n = 223 n= 169 



n (%) n {%) 





Male 


129 (57.8) 


71 (42.0) 


0.003 


Female 


94 (42.2) 


98 (58.0) 




AGE 


<50 


58 (26.0) 


33 (19.5) 


0.45 


51-60 


60 (26.9) 


54 (32.0) 




61-70 


78 (35.0) 


62 (36.7) 




>70 


27 (12.1) 


20 (11.8) 




FAMILY HISTORY 


Yes 


24 (10.8) 


0 (0.0) 


<0.001 


No 


197 (88.3) 


169 (100.0) 




Missing 


2 (0.9) 


0 (0.0) 






Blood 


31 (13.9) 


0 (0.0) 


<0.001 


Buccal 


156 (70.0) 


143 (84.6) 




Saliva 


36 (16.1) 


26 (15.4) 




SNP array platform 






<0.001 


lllumina human CNV370-duo 


144 (64.6) 


151 (89.3) 




lllumina human CNV370-quad 


79 (35.4) 


18 (10.7) 




a P-value based on Fisher's exact test. 



cases and controls using either univariate or multivariate logis- 
tic regression models in R. Notably, as reported by others, 
estimates of CNV burden are susceptible to non-specific sources 
of bias, including batch effects, DNA source effects, and age 
(International Hapmap 3 Consortium et al., 2010). Therefore, 
we analyzed the effects of DNA source, SNP array platform 
(lllumina Human CNV370-duo vs. CNV370-quad), and exper- 
imental batch on CNV burden using univariate linear regression 
models (Table 1). Variables that were associated with CNV bur- 
den (p < 0.1) were used as covariates in a multivariable logistic 
regression model to test for the association of CNV burden with 
risk. Our final multivariate model adjusted for age, gender, the 
top four principal components of genetic ancestry, experimen- 
tal batch, and SNP array platform. The case-control comparison 
was made for (1) both deletion and duplication calls together, 
(2) deletion calls alone, or (3) duplication calls alone. In either 
univariate or multivariate models, we estimated the odds ratio 
per-unit-of-CNV burden, and significance was determined using 
the 1-degree-of- freedom (df) Wald test. 

Single-locus risk association testing 

We evaluated individual CNVRs for association with risk using 
two approaches. First, we used the approach implemented in 
CNVtools and described previously (Barnes et al., 2008). Briefly, 



:p- 



• MSKCCCase 

• MSKCC Control 
HapMap TSI 

A HapMap CEU 





FIGURE 1 | Population structure of the study samples revealed by 
principal component analysis. Following SNP array genotyping, we 
applied the EIGENSTRAT package to 43,909 pairwise independent 
(r 2 < 0.1) SNPs with minor allele frequency (MAF) >0.05 and call 
rates >95% among the 223 pancreatic cancer cases, 169 controls, and 253 
individuals from reference HapMap CEU and TSI populations. A plot of the 
top two principal components of genetic variation (PC1 and PC2) is shown 
with cases as red dots, controls as black dots, CEU reference samples as 
blue triangles, and TSI reference samples as green triangles. As expected 
for our New York-based population study, the majority of cases and controls 
clustered with either the CEU reference samples (i.e., central European 
ancestry) or TSI (southern Italian ancestry). A subset of cases and controls 
(representing those with Ashkenazi Jewish ancestry) clustered separately. 
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for each CNVR locus, the approach is to jointly fit two models: 
(1) a Gaussian mixture model describing the relationship of sig- 
nal intensity to copy- number genotype and (2) a generalized logit 
linear model describing the log-linear relationship of case-control 
status to copy-number phenotype. The models are fit twice, 
once under the null hypothesis of no risk association and again 
under the alternative hypothesis. A likelihood ratio test is then 
performed comparing the likelihood of the two fits with 1 df. 

Our second approach was a multivariate logistic regression 
model that adjusted for age, gender, the top four principal com- 
ponents of genetic ancestry, and DNA source. For each locus of 
interest, copy-number genotypes were obtained based on in sil- 
ico genotyping (described above); we estimated the per-copy odds 
ratio (OR) and significance was determined using the 1-df Wald 
test. 

RESULTS 

CHARACTERISTICS OF THE STUDY PARTICIPANTS 

Table 2 describes the characteristics of 223 pancreatic cancer 
cases and 169 healthy controls included in our analyses. The 
majority of samples were processed on the Illumina Human 
CNV370-duo platform using DNA collected from buccal sources 
(mouthwash or saliva). We observed a significant association of 
case-control status with gender (p-value = 0.003), family history 
(p-value < 0.001), DNA source (p-value < 0.001), and array 



platform (p < 0.001). Following SNP array genotyping, principal 
component analysis revealed that the majority of our cases and 
controls clustered into northern and southern European genetic 
ancestry groups (Figure 1). We also observed a smaller subset 
of individuals that clustered separately into a group identified as 
Ashkenazi Jewish. 

CNV DISCOVERY 

Our approach to CNV discovery is summarized in Figure 2. After 
sample-level and CNV-level quality control filtering, we derived 
a total of 3520 high-confidence CNV calls from 223 cases and 
169 controls. Of the total 3520 CNV calls, 1912 (54.3%) were 
deletions and 1608 (45.7%) were duplications. The median CNV 
length was 50.3 kb and 135.2 kb for deletions and duplications, 
respectively. 

Notably, using a minimum overlap threshold of 40%, we found 
that 3407 (96%) of the CNVs discovered in our study were over- 
lapped by a CNV previously reported in the DGV. Of the remain- 
ing 113 putatively novel CNVs, 17 (1.5%) were observed among 
study participants with Ashkenazi Jewish ancestry. Furthermore, 
377 CNVs (194 deletions, 183 duplications) were found to be 
"singletons" in our study (i.e., detected in only one study sample). 

To determine whether non-specific technical factors influ- 
enced our CNV discovery results, we first compared the distri- 
butions of CNV call rates across different genotyping batches 



263 Cases 201 Controls 



Illumina Human CNV370 Duo/Quad 
SNP Array Genotyping 



PennCNV 



33,915 CNV calls 



QuantiSNP 



1 



53,440 CNV calls 



QC: 

- low-quality samples (46) 

- CNV from low-quality samples (13,266) 

- CNV<5kb (4,600) 

- CNV spanning < 5 probes (4,414) 



QC: 

- low-quality samples (50) 

- CNV from low-quality samples (18,572) 

- CNV with log Bayes Factor < 15 (29,446) 



11,635 CNV calls 



5,422 CNV calls 



3,520 merged CNV calls in 223 cases and 169 controls 



FIGURE 2 | Schematic overview of the CNV discovery pipeline. 

Whole-genome SNP array genotyping was applied to the germline DNA of 263 
pancreatic cancer patients (cases) and 201 healthy individuals (controls). 
Following normalization, probe intensities were analyzed separately by two 



CNV detection algorithms, PennCNV and QuantiSNP Quality-control filtering 
was applied to the outputs of these algorithms by removing low-quality samples 
and/or low-confidence CNV calls. This resulted in a final set of 3520 putatively 
high-quality, high-confidence CNV calls across 223 cases and 169 controls. 
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FIGURE 3 | Box-and-whisker plots of the number of CNV calls made 
within different experimental batches. Whole-genome SNP array 
genotyping was performed in 28 batches. Based on the final derived set of 
3520 QC-filtered CNV calls, the interquartile range and median number of 



calls in a given genotyping batch are represented by a white box and black 
bar, respectively. The whiskers are drawn to 1.5 times the interquartile rang 
circles are drawn to represent individuals with a total number of CNV calls 
beyond that range. 



(Figure 3). Indeed, significant batch-to-batch variation was 
observed, suggesting that experimental "batch effects" may have 
played a role. Similarly, we observed significant differences in 
the distributions of CNV call rates for samples genotyped on 
the Illumina Human CNV370-duo vs. -quad platform (Figure 4). 
In contrast, no significant differences were observed when com- 
paring the CNV call rate across different sample DNA sources 
(Figure 5). 

Finally, using an iterative clustering procedure (described in 
Materials and Methods), we collapsed the 3520 individual CNVs 
into 809 unique CNV regions (CNVRs), i.e., continuous segments 
of the genome spanned by one or more CNVs (Figure 6). 

COMPARISON OF CNV BURDEN BETWEEN CASES AND CONTROLS 
Global CNV burden 

Under the hypothesis that CNV burden is a risk factor for pan- 
creatic cancer, we first sought to compare CNV burden between 
cases and controls. Here, considering all 3520 CNVs discovered in 
our study regardless of frequency, we defined CNV burden as the 
number of CNV calls made in an individual. This measure was 
derived on a per- individual basis by counting (1) deletions and 
duplications together, (2) deletions only, or (3) duplications only 
and then averaged across case and control groups (Table 3). 

The average case/control CNV burden ratio was observed to 
be 1.07 counting all CNV types together, 0.98 counting deletions 
only, and 1.17 counting duplications only. To assess whether these 
differences in CNV burden were significant, we employed a logis- 
tic regression model and estimated the odds ratio (OR) per unit 



Blood 
(n = 31) 



Buccal 
(n = 299) 



Saliva 
(n = 62) 



FIGURE 4 | Box-and-whisker plots of the number of CNV calls made 
across different DNA sources. Germline DNA was extracted from either a 
blood, buccal, or saliva biospecimen offered by each individual in our study. 
Based on the final set of 3520 QC-filtered CNV calls, the interquartile range 
and median number of calls derived from a given DNA source are 
represented by a shaded box and black bar, respectively. The whiskers are 
drawn to 1.5 times the interquartile range; circles are drawn to represent 
individuals with a total number of CNV calls beyond that range. We 
observed moderate (but non-statistically significant) variation in the number 
of CNVs detected between the DNA sources. 
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lllumina CNV370-duo 
(n = 295) 



lllumina CNV 370-quad 
(n = 97) 



FIGURE 5 | Box-and-whisker plots of the number of CNV calls made 
between different configurations of the genotyping platform. 

Whole-genome SNP array genotyping was performed on two different 
configuration of the lllumina HumanCNV370 bead array: the duo and quad. 
Based on the final set of 3520 QC-filtered CNV calls, the interquartile range 
and median number of calls derived from each configuration are 
represented by a shaded box and black bar, respectively. The whiskers are 
drawn to 1.5 times the interquartile range; circles are drawn to represent 
individuals with a total number of CNV calls beyond that range. We 
observed statistically significant differences in the number of CNVs 
detected between the two configurations. 



of CNV burden. Under a univariate model, we observed no signif- 
icant association between pancreatic cancer risk and CNV burden 
when counting all CNV types together (OR = 1.05, p = 0.12) 
or deletions only (OR = 0.99, p = 0.73). A nominally significant 
(p-value < 0.05) association was observed when counting dupli- 
cations only (OR = 1.10, p = 0.02). However, under a multi- 
variate model controlling for age, gender, genetic ancestry, and 
the non-specific effects of experimental batch and SNP array 
platform, we observed no statistically significant associations. 

We further hypothesized that CNV burden would be enriched 
in patients with a family history of pancreatic cancer or early- 
onset disease. To evaluate this, we compared the CNV burden 
between controls and cases (n = 24) who reported a history of 
pancreatic cancer in at least one first-degree relative. Similarly, 
we compared CNV burden between controls (n = 33) aged 50 
or younger and cases (n = 58) diagnosed at or prior to age 50. 
Again, although we observed minor differences in case/control 
CNV burden, these differences were not statistically significant in 
either univariate or multivariate analysis. 

Putative rare or de novo CNV burden 

To explore whether putative rare or de novo CNV burden is 
associated with pancreatic cancer risk, we extended the above 
analysis by considering only the subset of 377 CNV calls detected 
in a single individual in our study (Table 4). In both univariate 
and multivariate analyses, no statistically significant case-control 
differences in CNV burden were detected. 



Chromosome Position 

FIGURE 6 | Clustering of CNVs from different samples to identify 
common CNV regions (CNVRs). In this illustrated example, each green 
bar represents a CNV call detected in a single individual (either a case or 
control). CNVs with reciprocal overlap of at least 40% were clustered into 
the same CNVR. 



ANALYSIS OF CNV LOCI PREVIOUSLY DISCOVERED IN FAMILIAL 
PANCREATIC CANCER 

Next, we compared our CNV discovery results to loci that have 
been previously implicated in familial pancreatic cancer by scan- 
ning for CNVs that overlapped with 28 duplications and 25 dele- 
tions identified by Lucito et al. and 40 duplications and 53 dele- 
tions identified by Al-Sukhni et al. (Table 5). Seven overlapping 
CNV loci of the same type were identified, including 4 dele- 
tions and 3 duplications. At five of these loci, CNV events were 
observed in both our cases and controls together, or our controls 
alone. However, for the remaining two CNV loci (a duplica- 
tion at chrl8:6838462-7291170 and deletion at chr9:2235919- 
2351848) we observed a single CNV event exclusively in one of 
our cases. 

ASSOCIATION OF INDIVIDUAL CNVR LOCI WITH PANCREATIC 
CANCER RISK 

Lastly, we evaluated whether specific CNVR loci were associated 
with pancreatic cancer risk. To facilitate robust in silico CNVR 
genotyping and to avoid potential biases in signal characteris- 
tics between the lllumina CNV370-duo and lllumina CNV370- 
quad platforms, we focused this analysis on the subset of 295 
samples (144 cases, 151 controls) genotyped using lllumina 
Human CNV370-duo. In silico copy-number genotyping was 
attempted across 706 CNVR loci that were derived from sam- 
ples in this subset (Figure 7, described in Materials and Methods). 
Of those loci, only 176 were successfully genotyped with high 
quality. 

Each CNVR that could be successfully genotyped was then 
analyzed for association with pancreatic cancer risk by use of a 
likelihood ratio test (Table 6). We observed a total of seven loci 
associated with p-values < 0.05. Considering the number of loci 
tested, only one association (PA-CNVR46.1, likelihood ratio test 
p = 6.41 x 10~ 5 ) met the Bonferroni threshold of significance. 
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Table 3 | Comparison of global CNV burden in cases and controls based on the final derived set of 3520 QC-filtered CNV calls. 



Subjects 




CNV type 




Mean CNV burden 3 




Logistic regression 










Cases 


Controls 


Case/control ratio 


Univariate 


Multivariate 13 














OR 


P-value 


OR 


P-value 


All cases (n= 223) vs. 




All 


9.22 


8.66 


1.07 


1.05 


0.12 


1.01 


0.80 


all controls [n — 169) 




DGlstions 


4.84 


4.92 


0.98 


0.99 


0.73 


1.01 


0.78 






r)i inlir^tinn^ 

LJ U \J\ J wQ LIUI lo 


4.38 


3.73 


1.17 


1.10 


0.02 


1.00 


0.93 


Cases with family history (n = 


24) vs. 


All 


8.83 


8.66 


1.02 


1.02 


0.79 


1.02 


0.78 


all controls (n = 169) 




Deletions 


5.29 


4.92 


1.07 


1.07 


0.48 


1.07 


0.52 






Duplications 


3.54 


3.73 


0.95 


0.95 


0.67 


0.96 


0.73 


Cases diagnosed age <50 (n = 


= 58) vs. 


All 


8.21 


8.52 


0.96 


0.96 


0.62 


0.93 


0.41 


controls age <50 (n = 33) 




Deletions 


4.17 


4.97 


0.84 


0.80 


0.07 


0.87 


0.30 






Duplications 


4.03 


3.55 


1.14 


1.16 


0.22 


0.96 


0.79 



OR, per-unit CNV burden odds ratio. 

a Mean CNV burden is defined as the average number of CNVs detected in each group of samples, and is derived by counting deletions and duplications together, 
deletions only, or duplications only. 

b Multvariate model adjusted for age, gender, genetic ancestry, experimental batch, and SNP array platform. 



Table 4 | Comparison of putative rare or de novo CNV burden in cases and controls based on a subset of 377 CNV calls observed in a single 
individual. 



Subjects 




CNV type 




Mean CNV burden 3 




Logistic regression 










Cases 


Controls 


Case/control ratio 


Univariate 


Multivariate 13 














OR 


P-value 


OR 


P-value 


All cases (n = 223) vs. 




All 


0.96 


0.97 


0.98 


0.99 


0.89 


1.00 


0.97 


all controls (n = 169) 




Deletions 


0.52 


0.46 


1.13 


1.09 


0.50 


1.12 


0.37 






Duplications 


0.43 


0.51 


0.85 


0.87 


0.32 


0.85 


0.27 


Cases with family history (n = 


24) vs. 


All 


1.04 


0.97 


1.07 


1.05 


0.78 


1.03 


0.87 


all controls (n = 169) 




Deletions 


0.42 


0.46 


0.90 


0.94 


0.82 


0.95 


0.85 






Duplications 


0.63 


0.51 


1.23 


1.20 


0.49 


1.14 


0.65 


Cases diagnosed age <50 (n = 


= 58) vs. 


All 


0.79 


0.97 


0.82 


0.78 


0.34 


0.83 


0.55 


controls age <50 (n = 33) 




Deletions 


0.31 


0.39 


0.79 


0.80 


0.52 


0.73 


0.49 






Duplications 


0.48 


0.58 


0.84 


0.84 


0.53 


0.94 


0.88 



OR, per-unit CNV burden odds ratio. 

a Mean CNV burden is defined as the average number of putative rare or de novo CNVs detected in each group of samples, and is derived by counting deletions and 
duplications together, deletions only, or duplications only. 

b Multvariate model adjusted for age, gender, genetic ancestry, experimental batch, and SNP array platform. 



However, in a multivariate logistic regression model adjusted 
for age, gender, experimental batch and the top four principal 
components of genetic ancestry, this region did not remain statis- 
tically significantly associated with risk (per-copy OR = 0.86, 95% 
CI = 0.58-1.26,^ = 0.44). 

DISCUSSION 

In this study, we sought to investigate the roles of CNV bur- 
den and individual CNV loci in pancreatic cancer susceptibility. 
Toward this end, we first performed genome-wide CNV discov- 
ery within a hospital-based cohort of 223 pancreatic cancer cases 



and 169 healthy controls using a SNP array platform. To help 
minimize the proportion of false-positive CNVs in our data set, 
we took the approach of analyzing whole-genome SNP array data 
using two separate CNV discovery algorithms (PennCNV and 
QuantiSNP) followed by stringent QC filtering. 

A small proportion (n = 113) of the CNV loci detected in 
our study were not overlapped by CNVs previously reported 
in the DGV One likely explanation is that these regions are 
platform-specific artifacts. Indeed, because we did not exper- 
imentally validate the CNV loci discovered in our study, we 
cannot exclude the presence of artifacts in our downstream 



Frontiers in Genetics | Applied Genetic Epidemiology 



February 2014 | Volume 5 | Article 29 | 8 



Willis et al. 



CNVs in pancreatic cancer 



Table 5 | Overlap between discovery CNVs (this study) and CNVs previously implicated in familial pancreatic cancer. 



CNV locus (hg18) 


Tvdp 




Number of samples (this study) 
with an overlapping CNV of same type 

All cases Cases with family Controls 
(n = 223) history [n = 24) [n = 169) 


ch r1 2 : 1 30382 1 66-1 30686668 


Deletion 


Al-Sukhni etal., 2012 


3 


1 2 


chr3:60219748-60263116 


Deletion 


Lucito et al., 2007 


3 


0 6 


chr18:6838462-7291170 


Duplication 


Al-Sukhni etal., 2012 


1 


0 0 


chr9:2235919-2351848 


Deletion 


Al-Sukhni etal., 2012 


1 


0 0 


chr11:39882017-40010124 


Deletion 


Al-Sukhni etal., 2012 


0 


0 1 


chr19:2984601-5201290 


Duplication 


Lucito et al., 2007 


0 


0 1 


chr7: 1 33223330-1 33393933 


Duplication 


Al-Sukhni et al., 2012 


0 


0 1 



A lllumina CNV370-Duo/Quad SNP Array Data for 
223 Cases and 1 69 Controls 

Extract and summarize signal intensity for 
probes within 809 CNVRs 




Copy number class assignment (CNVtools): 

Normal copy number 
Heterozygous deletion 
Homozygous deletion 



Summarized Probe Intensity 

C Copy number genotypes for 809 CNVRs 



v. 



QC: 



remove CNVRs with low-quality genotypes 

remove CNVRs with minor allele frequency < 5% 

remove CNVRs discovered only on CNV370-Quad Platform 



Copy number genotypes for 176 CNVRs 



FIGURE 7 | Schematic overview of in silico genotyping of 809 CNVRs 
across pancreatic cancer cases and controls. (A) Using SNP array data for 
each of the 223 cases and 169 controls, we extracted signal intensity 
information for probes that overlapped with each of the CNVRs. Probe 
intensities for a given CNVR region were summarized on a sample-by-sample 
basis by taking the mean or by use of principal component analysis. (B) In 
silico genotyping for each CNVR was then performed using the CNVtools 
package, which assigns cases and controls to copy number classes by jointly 



fitting a Gaussian mixture model and a log-linear regression model to the 
observed distribution of summarized probe intensities. An example fit is 
shown, overlaid with the estimated locations of individuals who have normal 
copy number class, a heterozygous deletion, or a homozygous deletion (blue, 
green, and black lines, respectively) for this CNVR. (C) Quality-control was 
applied by removing CNVRs with low-quality genotyping, low minor allele 
frequency, or CNVRs that were derived solely from samples genotyped on 
the lllumina CNV370-quad platform. 



analyses despite the application of rigorous filtering methods. In 
addition, 17 of the previously unreported CNV loci were observed 
among subjects with Ashkenazi Jewish ancestry. Therefore, we 
further speculate that at least some of the CNV calls unique 
to our study may be true population-specific CNVs from pop- 
ulations (i.e., Ashkenazi Jewish) that are underrepresented in 
the DGV. 



Experimental batch effects are well known to the genomics 
field and require proper consideration when performing case- 
control analyses. In this study, we observed significant varia- 
tion in distributions of CNV call rates across different sample 
batches, which is likely the result of technical variation during 
the course of batch processing. In support of this hypothesis, 
we also observed variations in CNV call rates at the individual 
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Table 6 | Likelihood ratio tests for the association of 176 CNVR loci with pancreatic cancer risk. 



CNVR 


Locus(hg18) 


Type 3 


Likelihood ratio 


P- value 


DA r*M\/D>!C 1 

rA_L.NVn4b. I 


Cnr4. I I boo/oy /- 1 I b4U I /oy 


Multtallelic 


1 5.98 


c a 1 in— 5 
b.4 1 x IU 


DA PNH/D101 A 

rA_LNVn lo I.4 


nkri7>yii7mT)c ahaiacio 

cnr i / .4 1 /yz_ob-4_ i4o4yo 


Multiallelic 


10.79 


0.001 


DA r*M\/DO"7C O 

rA_LNVnz /b.z 


cnr lb.zooU4 144-zoby lo Iz 


Multiallelic 


7.09 


0.01 


DA r*M\/D 1 A £ 1 

rA_LIMVn 1 4b. I 


cnro. l4bboybzb- I4b/U I loo 


Deletion 


6.51 


0.01 


DA PM\/D110 1 

rA_LNVn \\Z.\ 


cnro.bboboU /-bby_4yb 


Deletion 


6.31 


0.01 


DA r*M\/D/ C 1 

rA_L.NVn4b. I 


cnr4. lUozob lo/- IUo_yoz4b 


Deletion 


4.87 


0.03 


DA PM\/DC1C 1 

rA_LNVnD I b. I 


cnr I b.o4ozboUo-o4b I o4bo 


Duplication 


4.76 


0.03 


DA PM\/D01 10 

rA_L.NVnol.lo 


chrb.ozbbo4bU-ozb/ /bUo 


Multiallelic 


3.87 


0.05 


DA N l\ /D T O -1 1 

rA_L.NVnlol.l 


chro.oo44o9yi-oo4bb4z/ 


Deletion 


3.84 


0.05 


DA PM\ /DO ylO 1 

rA_L.NVnz4o.l 


chrl2.ab3d9a4-Bbbb/a4 


Deletion 


3.83 


0.05 


DA PM\/D/ICO 1 

rA_L.NVn4bz.1 


„k .-n . /i yif^oonnn A ao A a A^n 

chry.44booUyu-44o444z9 


Duplication 


3.80 


0.05 


DA PM\/DOTO 1 

rA_L.NVno/o.1 


chro.ooOlOzo-ooozybo 


Duplication 


3.59 


0.06 


da n i\ /dopip" o 

rA_L.NVnzUb.z 


chrzU.zbl bbbyz-zozbbbob 


Duplication 


3.35 


0.07 


DA PM\/DOOO 1 


cnr_._Uo/4zybu-_Uo /bUb4o 


Deletion 


3.29 


0.07 


DA PM\/DOn 1 

rA_L.NVnyU. I 


^krC'7omnc/io 7tim/ncc 

cnrb. /yu_yb4y-/y iU4zbb 


Deletion 


3.26 


0.07 


DA PM\/D/10ri 1 

rA_LNVn4o9.1 


ch rb . 1 U44b 1 4 1 b-1 U4bo 1 94b 


Deletion 


3.08 


0.08 


DA PM\/DOOO 1 


/ikrO'Cnn/io77n cpioci~7 co 
cnro.bUU4o / /U-bUzb I /bz 


Deletion 


2.79 


0.09 


DA PM\/DOC71 

rA_L.NVnob/. I 


cnr lo.bbbb4bUo-bbbyooo/ 


Deletion 


2.78 


0.10 


DA PM\/DOOd O 

rA_L.NVnoo4._ 


cnro. /bbUz4zb-/bbbz I oo 


Deletion 


2.76 


0.10 


DA PM\/D1d/I 1 

rA_L.NVn I44. I 


cnro. l4bU4byb I- I4bz/bbb I 


Deletion 


2.68 


0.10 


DA PM\/D01 1 

rA_LNvno I. I 


cnrb.ozbbUUI l-ozb4ozbo 


Multiallelic 


2.64 


0.10 


DA PM\/DT10 O 

rA_L.NVnz Iz.o 


cnr i . 1 1 y 1 4yb- 1 z /o44b 


Multiallelic 


2.56 


0.1 1 


da PM\/Dcnn o 
rA_L.NVnoUy._ 


cnr I b. Ibzzb I oo- I b/Uobb/ 


Multiallelic 


2.48 


0.1 2 


DA PM\/D10 1 

rA_L.NVn I o. I 


cnr I4. IUbboUU4b- lUboo/oob 


Multiallelic 


2.42 


0.1 2 


DA HkW /DA O 1 

rA_LNVn4y. I 


cnr4. i zyyyoozb- lou i byzzb 


Deletion 


2.41 


0.1 2 


DA PM\/D/I m 1 

rA_LNVn4ly.l 


chrlU.bbo/ /luz-bbylboUo 


Deletion 


2.37 


0.12 


DA PM\/D/10[; 1 

rA_LNVn4ob.l 


^k -rr.p> — pi - o/i r\c\ m-i otcto 

chrb.y /u / o409-9 /\Z/b/Z 


Deletion 


2.35 


0.13 


DA P^MWD - O C 

rA_LNVn/o.b 


chrb.ol4/y/b/-olbUzb/y 


Deletion 


2.31 


0.13 


DA PM\/D01 n 

rA_L.NVno1.1z 


„k .oornooco ooP"orr — — -i 

chrb.ozbUoobo-ozbob / / 1 


Multiallelic 


2.19 


0.14 


da pm\ /Donp o 

rA_L.NVnzUb.o 


chrzU.zoU4obUb-zozbbbob 


Multiallelic 


2.12 


0.14 


DA PM\ /Dinn IP" 

rA_L.NVnzbo.1b 


chrlz.olz4ooby-olzyol /4 


Duplication 


2.12 


0.15 


DA P^MWD-l/l/l C 

rA_L.NVnl44.b 


chro.l4bUb40yl-l4bllobbU 


Deletion 


2.08 


0.15 


DA PM\/DOl A 1 

rA_L.NVno 14. I 


cnrz.bzb/ /bb4-bzbo/ I /b 


Deletion 


2.04 


0.1 5 


DA Hh\\/D~7 O 1 

rA_LNVn/o. I 


cnrb.o l4bbo/U-o Ibbzobb 


Deletion 


2.03 


0.1 5 


DA PM\/DOnC 1 

rA_L.NVnoUb. I 


cnrz.o4bbbbb l-o4bo(J(Jbo 


Multiallelic 


2.01 


0.1 6 


DA PM\/DO/| C 1 

rA_L.NVnz4b. I 


cnr 1 ._4boU_by_-_4bob_Ubo 


Deletion 


1 .92 


0.17 


DA PM\/Dm 1 

rA_LIMVn I /Z. I 


cnr i / . i y44bb /b- 1 y4 / buzb 


Deletion 


1 .90 


0.17 


DA r*M\/D 1 /l"7 1 

rA_LIMVn 14/ 1 


cnr I I .o I /y44b-oob IU I4 


Deletion 


1 .89 


0.17 


DA PM\/DOQ71 

rA_L.NVnzo/. I 


cnr i . i y4yyzyoy- 1 yb i boo /b 


Multiallelic 


1 .88 


0.17 


DA f^MWDOCPl 1 

rA_L.NVnobU. I 


_u r n . oooncofn7 00007 coco 

cnrz.zozyboyz /-zozy /by by 


Deletion 


1 .79 


0.1 8 


DA PM\/DC"7Q 1 

rA_LNVno/o. I 


„k r o Q.-iQCOoooo inn/iQiic 

cnrzz. i obyozyy- 1 yu4o 1 1 b 


Duplication 


1 .78 


0.1 8 


DA P^MWDOHO 1 

rA_L.NVnoUz. I 


cnrz. i /uobbuy-i /uyboby 


Deletion 


1 .76 


0.1 8 


DA PM\/D1 -1 

rA_LIMVn I bb. I 


cnr 1 1 . looob lozy- io4zz /ubz 


Duplication 


1 .75 


0.1 9 


DA PM\/D0071 

rA_L.NVnoy/. I 


cnro. I bbbz44ob- IbbbbU l(J/ 


Deletion 


1 .74 


0.1 9 


da PM\;Dinn i 

rA_L.NVnzyy.l 


„k ^0 . a 1 ni oco /iop\iri/io 

chrz.4iyizbo-4zUiy4o 


Deletion 


1 .68 


0.19 


PA_CNVR192.1 


chr!9:40613l06-40636215 


Deletion 


1.68 


0.20 


PA_CNVR16.1 


chr!4:85528167-85560365 


Duplication 


1.53 


0.22 


PA_CNVR387.1 


chr3:116125098-116150586 


Deletion 


1.46 


0.23 


PA_CNVR142.1 


chr8:!377574l2-137926509 


Deletion 


1.42 


0.23 


PA_CNVR138.1 


chr8:H5704806-115710821 


Deletion 


1.41 


0.23 


PA_CNVR253.2 


chrl2:3H57554-31298174 


Duplication 


1.41 


0.23 


PA_CNVR7.2 


chr!4:39308459-39982197 


Deletion 


1.35 


0.24 



(Continued) 
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Table 6 | Continued 



CNVR 


Locus(hg18) 


Type 3 


Likelihood ratio 


P- value 


DA CKW/DOdA 1 

rA_UIMVnob4. I 


enr lo. IUoobo4yo- lUooo lobb 


Deletion 


1 .35 


0.25 


da PM\/Dcnn -i 

PA_LNVnbu9.1 


chr1b.1bbb/4zo-1b/zb//o 


Multiallelic 


1 .34 


0.25 


DA PM\/D/IT3 A 

rA_Ur\IVn4zo.4 


enr iu.b/ybzy /b-bouy lo iz 


Deletion 


1 .34 


0.25 


DA PM\/DQCO 1 

rA_Ur\IVnobo. I 


„u, r in.i om n"7/1 1 1000/1700 

enr lo. i oU i y /4 1- 1 ooo4/oz 


Duplication 


1 .26 


0.26 


DA rM\/Dyi"M 1 

rA_Ur\IVn4/z. I 


nUrCiOICTMm 01010m 

cnrb.o lbzzzy/-o lo lo loo 


Duplication 


1 .26 


0.26 


DA fM\/DT3/ 1 

rA_Ur\IVnzo4. I 


enr I . I o/b4y4zb- 1 0 / / oyobb 


Deletion 


1 .24 


0.27 


DA PM\/DOyi71 

rA_UIMVno4/. I 


cnrz.zoU/yy4b /-zoUoy/zy i 


Multiallelic 


1 .20 


0.27 


DA PM\/D1 CC 1 

PA_LNVK1 bb.1 


chrll .bblz44bb-bblo(J/oo 


Deletion 


1.13 


0.29 


PA_CNVR438.2 


chr9: 181 843-264641 


Multiallelic 


1.10 


0.29 


DA PKH/DCD O 

PA_LNVnby.z 


chr4.1 /ozltSllo-l /ozbo440 


Deletion 


1.10 


0.29 


da m\ /d en 1 


^k _/i .noooinr: -i ~7o oo~7/i cn 

chr4.1 /ozzzoob-1 /ozz /4bu 


Deletion 


1.10 


0.29 


DA PM\/D11 /I O 

PA_LNVR114.o 


chro. /boo44b-/yzy10/ 


Deletion 


1 .09 


0.30 


PA_LNVRz1z.z 


chrl .luyboob-14boU4o 


Multiallelic 


1 .06 


0.30 


DA PMWD1CC O 

rA_UN Vn I bb.o 


enr 1 1 .bb 1 z44bb-bbzuy4yy 


Deletion 


1 .04 


0.31 


DA PMWDQ1 ioi 

rA_LNVno I. I z / 


cnrb.ozbyo iyu-ozbobUb/ 


Multiallelic 


1 .00 


0.32 


DA PM\/DO/ll 1 

rA_UlMVnz4/. I 


_i_ r n.-700/i coo onnnn 

enr iz. /oo4boo-oU 1 /u iz 


Multiallelic 


0.99 


0.32 


DA A A A 1 

rA_UNVn444. I 


^k rO- 1 1 oo"70"7c nmm/i 

enry. i loo/o/b- iz i // IU4 


Deletion 


0.99 


0.32 


DA PMUDOCIO 1 

rA_UIMVnzyz. I 


„u_i c.qccoico/1 occumo 
enr Ib.obbo lbo4-obb/ luzo 


Deletion 


0.97 


0.33 


DA PM\/D1 no 1 

rA_UN Vn I so. I 


enr iy.4oo izyy/-4obo iyzo 


Multiallelic 


0.96 


0.33 


DA PM\/DOO 1 

rA_LNVnzo. I 


cnr4.o44bbUoz-o44yy4z4 


Deletion 


0.95 


0.33 


rA_CNVnzb4. I 


„U_i O-IOCCOTO/I "70COOOCO 

enr lz./obbz/z4-/obyoobo 


Multiallelic 


0.91 


0.34 


DA PM\/D11/ 1 Q 

rA_UIMVn M4. lo 


enro. /boo44b-/ /o lybb 


Multiallelic 


0.86 


0.35 


da PM\/Dmn 1 
rA_UN Vn I aU. I 


„u_i n.onooco/11 oncooooc 

enr iy.zuooby4 i-zubzzozb 


Deletion 


0.85 


0.36 


DA PM\/DOnC 1 


enro. iboyob/o4- ib4 iuyz/y 


Deletion 


0.75 


0.39 


da PM\/Dirn 1 
rA_UN Vn IUz. I 


^k rC> icomonon icqo/io-iqo 

enrb. ibou/oyzy- iboo4z loz 


Duplication 


0.74 


0.39 


da pm\ /D/i no 

PA_LNVR41 /.o 


chrlu.4bb11 yz /-4/z1oyio 


Multiallelic 


0.73 


0.39 


DA PM\/DOir 1 

PA_LNVno1b.z 


chrz.b/zbb144-b/zyy/1o 


Multiallelic 


0.73 


0.39 


DA PNU/DOTT -1 

PA_LNVRoob.1 


chrz.1 oOl zo 1 bo-1 oUlzyyi 0 


Deletion 


0.72 


0.39 


DA PMWD-inO 1 

rA_LNVniyj.Z 


^k c\. a 1 no it mo flo/i on-i on 

chrly.4/yobb/o-4o4oU1oU 


Multiallelic 


0.65 


0.42 


DA PMWDCOO 1 

PA_LNVRboo.1 


^k .oinnoc oooo con 

chr/.o/yi /ob-oooobzy 


Deletion 


0.64 


0.42 


DA PNU/DI/IH -1 

PA_LNVRz4U.1 


_ u r -1 .nni-innciro on-i coo /i oo 

chrl .zUnuybbo-zUI bzo4zo 


Multiallelic 


0.63 


0.43 


DA PM\/DC1C 1 

rA_UIMVnb lb. I 


enr lb.obU4 1 lb l-obU/4ooo 


Deletion 


0.59 


0.44 


DA PM\/Dcn 1 

rA_UIMVnb(J. I 


.L r /i .1001GCQOC 1O0C/I "7CC /I 

cnr4. loz Ibbozb- lozb4/bb4 


Deletion 


0.56 


0.45 


DA PM\/D"30yl 1 

rA_UIMVnoz4. I 


chrz.oybozbbo-ayy i iu iu 


Multiallelic 


0.55 


0.46 


DA PM\/DCO C 

rA_UIMVnbo.b 


^k rC> onncn/i oo ooococ/ic 

cnrb.zyyby4zz-zyybyb4b 


Deletion 


0.51 


0.47 


DA PM\/D/10C 1 

r A_U IM V n4ob . z 


^krin-io/ioono"70 loennnnoo 

enr iu. io4oy(Jz/o- lobuuuuzz 


Deletion 


0.50 


0.48 


DA PM\/D1C/ 1 

rA_UIMVn I b4. I 


enr 1 1 .bUb/obo l-b(Jbo/Ubo 


Duplication 


0.49 


0.48 


DA PM\/D1 no 1 

rA_UIMVn lUo. I 


^krO-oooocno ic7nm 
cnro.zzozbUz-zb / (J I / I 


Duplication 


0.49 


0.49 


DA r*M\/D"7>1 1 

rA_UIMVn /4. I 


cnrb.o Ibbbb Iz-o IboUbyy 


Deletion 


0.46 


0.50 


DA PM\/D/1C71 

rA_UIMVn4b/. I 


cn rb.o / bbUob-o /y / bb / 


Deletion 


0.46 


0.50 


DA PM\/DCr>C 1 

rA_UIMVnb(Jb. I 


-U r -| 0-C1 0C07n0 C1QCOOCO 

enr lo.b lobz/Uo-b loboobo 


Multiallelic 


0.45 


0.50 


DA PM\/D101 1 

rA_UIMVn lo 1. 1 


7./1 i7on/ioo /lonmocn 

enr 1 /.4 1 /oU4oz-4zuyzob(j 


Multiallelic 


0.42 


0.51 


DA PM\/DC/1C 1 

rA_UIMVnb4b. I 


cnr/.b44U/byb-b4byob lb 


Deletion 


0.39 


0.53 


DA PMl/DOfl/l 1 

PA_LNVRzy4.1 


^k ^-icncnoomn ricnnTi 

chrlb.ybbzzU/y-ybboz/ /I 


Deletion 


0.39 


0.53 


DA PMV/D1Q1 "7 

PA_LNVR1o1./ 


^k n-1 "7. 1 £>OC5 / ~7/l /IO-I /lO^OO 

chrl /.41bob4/4-4z14o4yo 


Multiallelic 


0.38 


0.54 


PA_CNVR353.2 


chr13:17922259-18120572 


Duplication 


0.38 


0.54 


PA_CNVR72.1 


chr6:31382534-31406722 


Deletion 


0.37 


0.54 


PA_CNVR72.41 


chr6:31382224-31419324 


Deletion 


0.37 


0.54 


PA_CNVR417.1 


chr10:46830464-47218918 


Multiallelic 


0.37 


0.54 


PA_CNVR514.2 


chrl 6:32467276-32498422 


Deletion 


0.37 


0.54 


PA_CNVR517.1 


chrl 6:44943958-4504891 5 


Duplication 


0.37 


0.54 


PA_CNVR392.1 


chr3:133185033-133195707 


Multiallelic 


0.36 


0.55 
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CNVR 


Locus(hg18) 


Type 3 


Likelihood ratio 


P- value 


DA PM\/DCOO 1 

rA_LNVnoyo. I 


cnrz i .zy4boy ib-zy4/8obz 


Deletion 


0.36 


0.55 


DA PM\/DCO 1 0 

rA_LNVnbo. la 


cnrb.zyyoob iy-zyyyy4Uz 


Deletion 


0.35 


0.55 


DA PM\/Doo C 


cnrb.zyy/z i8z-zyyyb4/8 


Deletion 


0.34 


0.56 


DA PM\/DinO 1 

rA_L.NVnzUo. I 


CnrzU.bzU/ob/o-DzUy4 148 


Deletion 


0.32 


0.57 


DA PM\/DCn71 

rA_L.NVnbU/. I 


^Uri 0'C/inno7in c/iPi/ii/ini 
Chr l8.b4UUo/ iy-b4U4z4UI 


Deletion 


0.30 


0.59 


DA PM\/D1 /IC 1 

rA_LIMVn I4b. I 


cnro. 1 4b I IbbUb- 1 4b lo/Uz I 


Deletion 


0.30 


0.59 


rA_LNVn1oo.l 


-o .orr/iinur orr/iiioo/i 


Deletion 


0.29 


0.59 


DA P"" N l\ /D 1 /l 1 -1 

rA_LNVn14o.l 


Chr8.144b8boo8-144/8041 / 


Deletion 


0.28 


0.59 


DA PM\/D101 1 


chrl .1 /oUb4o4/-l /JUb/b4/ 


Deletion 


0.28 


0.60 


DA PMWDOO 1 

rA_LNVnbo.1 


chro.zyybUlbl-oUUzl /Ub 


Deletion 


0.27 


0.60 


DA r*M\ /D A 1 1 1 


-ii./rnonizi r^rnoo ac a 

chrlU.bby8Ubbz-bby884b4 


Deletion 


0.25 


0.62 


DA PMl/DTCO 1 

rA_LNVnzbo.l 


„u ,11.111 onici inni /in 

chr1z.o118Ulb1-olzo/14U 


Duplication 


0.25 


0.62 


DA PMWDOO "7 

rA_LNVnbo. / 


.inni/iirr inr>r>7i ir 

chrb.oUUUU41 b-oUUU/lzb 


Deletion 


0.24 


0.62 


DA PM\/DCD 1 

rA_LNVnbo. I 


rC ■ 1 a o"7Pin nemo 

cnrb.z4o/uu-ozby 18 


Multiallelic 


0.24 


0.62 


DA PKH/DOO 1 

rA_LNVnoo. I 


chrb.b/U/b448-b/ IU4Ulb 


Deletion 


0.24 


0.63 


DA PM\/DT/1C 1 

rA_LNVnz4b. I 


cnr Iz.b 1 14 1 /u-b lo4U8U 


Deletion 


0.20 


0.65 


DA PM\/DOC 1 

rA_UlNVnob. I 


cnr4.bobUoz Iz-bob/bzyb 


Deletion 


0.20 


0.65 


DA PM\/D/ICO 1 

rA_LNVn4bo. I 


cnrb.yyb lybz-yyo lobz 


Deletion 


0.20 


0.66 


DA PM\/DC1C O 

rA_LNVnb Ib.o 


„L,i CvOCPMUCI OC1/110.P1P1 

cnr lb.obU4 1 lb l-ob 1 4 lyuu 


Deletion 


0.19 


0.66 


DA PM\/D70 "7 

rA_LNVn/o. / 


cnrb.ozUby l8b-ozUbbo4o 


Deletion 


0.19 


0.66 


DA PM\/D10C 1 

rA_LNVn I zb. I 


cnro.oyoo ibyz-oybuyo/b 


Multiallelic 


0.1 8 


0.67 


DA PM\/DTC/ 1 

rA_LNVnzb4. I 


cnr 1 z.o 1 8y8o/o-o 1 ybbbbb 


Multiallelic 


0.1 8 


0.67 


DA PM\/D/QO 1 O 

rA_LNVn4bz. lo 


cnrb.byb i i48o-by/y iyy i 


Multiallelic 


0.1 8 


0.68 


da PM\/Dcn n 
rA_LNVnb/z.y 


enrzz. 1 16 l8ob/- 1 /oybbbo 


Duplication 


0.17 


0.68 


DA PM\/DmO C 

rA_LNVn lUz.b 


„u r i .icqipio.pi/ii 1 coo7/i inn 
enrb. I bozUy(J4 I- 1 b8z /4oUU 


Duplication 


0.16 


0.69 


DA PM\/DOCC 1 

rA_LNVnzbb.1 


m u„i o.innnnc 111m r\D a 

ch r 1 z . J J 1 y o /Ub-oozU1 Uo4 


Deletion 


0.16 


0.69 


DA PM\/DT7C 1 

rA_LNVnz/o.l 


m u ,1 c.n in iiri iinimn 

ch r 1 b . z 1 bUI o b 1 -z 1 b 1 z by U 


Deletion 


0.16 


0.69 


DA PM\/D01 OC 

rA_LNVn81.ob 


-i .11 cm m uz ior , cr , ioi 

chrb.ozblUlbb-ozbbbzol 


Multiallelic 


0.16 


0.69 


DA PM\/DTCC O 

rA_LNVnzbb.o 


m u ,1 i.iiin 1 no i/irriiirio 

chrlz\o4zo1lyo-o4byzbo8 


Duplication 


0.14 


0.71 


DA PKH/DdOC 1 

rA_LNVn4ob.l 


_ t_ , -i r\.ii/irii ill r> 11/io/inirr 

chrl U.I J4yi Jul o-1J4y4oJJb 


Deletion 


0.14 


0.71 


DA PM\/DdOO 1 

rA_LNVn4zz.1 


m u ,1 1 . 0—7-7/1 nor d £jT7acnn 

chr1U.b//4yJb4-b/ /obzUy 


Deletion 


0.10 


0.75 


DA PM\/DQ/l 1 

rA_LNVno4. I 


Chr4.boobz I /U-boob/ /U4 


Deletion 


0.08 


0.77 


DA r*M\/DO"7 1 


cnr4.byuu/z i /-byz iuuui 


Deletion 


0.08 


0.77 


DA r*M\/D*7Q O 

rA_LIMVn /o.z 


chrb.o l4bbyzo-o I4bbbz I 


Deletion 


0.07 


0.78 


DA PM\/D70 1 

rA_LNVn/o. I 


_L_ rC.1inC711 1 1111Q1/I 1 

cnrb.ozUb/oo l-oz I Ibz4 I 


Deletion 


0.07 


0.78 


DA PM\/D/IC1 1 

rA_L.NVn4b I. I 


chry.4ob i b /yb-4o/oUzyz 


Deletion 


0.07 


0.79 


DA PM\/Din 1 

rA_LINVn /U. I 


nkrC>inCCC10Q OPlC1"71C1 

cnrb.oUbbb I oo-oUb I /Zo I 


Deletion 


0.05 


0.82 


DA PM\/D0070 


rO illei / 0"7/l C 11C1CP1COC 

ch ro . II b 1 4o /4b- 1 1 b I bUbob 


Deletion 


0.05 


0.82 


DA PM\/DOCO 1 

rA_LNVnobz. I 


_i_ r i.i/iir-c;cn"7ri i/iicmoin 

chrz.z4zbbby /y-z4zbyzozU 


Deletion 


0.05 


0.83 


DA PM\/DOCO O. 

rA_L.NVnobz.y 


rO ■ 1 /II /I 1 11 1 C 1/11C/1C1C1 

Chrz.z4z4 Izz 1 b-z4zb4bzbz 


Deletion 


0.04 


0.84 


DA PM\/DT1Q 1 

rA_L.NVnzzy. I 


-L r i . 1 C"7/1 OC /I OH 1C7C1QC7fl 

chr I . I b /4ob4oU- 1 b/b I ob/y 


Deletion 


0.03 


0.85 


DA PNH/D1 A ~1 1 

rA_L.IMVn I4/.Z 


-u,ii .oi/iricco oommi 
Chr 1 1 .oz4Ubbo-ozy /U Iz 


Multiallelic 


0.03 


0.86 


DA PKH/DTOO 1 

rA_LNVnzzo. I 


-L r i . 1 a "7/1 OC/IIO 1/1 7C17 CO.Q 

Chr 1 . l4/4ob4zz- 14/bo/byo 


Multiallelic 


0.03 


0.86 


DA PM\/DC70 1 

rA_L NVnD/o. I 


r11-11CC0101 11710C0C 

chrzz.zzbbo lo l-zz/zobob 


Multiallelic 


0.03 


0.86 


DA PM\/DCCO C 

rA_L.NVnbbb.b 


,"7 .1110^-7111 111001111 

chr/. 11 084/1 ZZ-IIU0000ZZ 


Deletion 


0.03 


0.87 


PA_CNVR473.1 


chr5:32!37157-322U2977 


Duplication 


0.03 


0.87 


PA_CNVR158.1 


chr11:70966737-71226822 


Duplication 


0.03 


0.87 


PA_CNVR227.1 


chr1:146409913-146483416 


Multiallelic 


0.02 


0.88 


PA_CNVR102.4 


chr6: 1 68092530-1 681 62650 


Duplication 


0.02 


0.88 


PA_CNVR382.1 


chr3:4677!354-46825614 


Deletion 


0.02 


0.88 


PA_CNVR482.19 


Chr5:69724l06-69791981 


Duplication 


0.02 


0.89 


PA_CNVR278.1 


chr!5:298l2822-30302218 


Multiallelic 


0.02 


0.90 
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CNVR 


Locus(hg18) 


Type 3 


Likelihood ratio 


P- value 


DA PM\/DOOn 1 

rA_UIMVnzoU. I 


cnr I b.oz4byb lU-ozbzb I o4 


Deletion 


0.01 


0.91 


DA rM\/DC11 1 

rA_UIMVno I I. I 


cnr lb. lyob/ lob- lyobzyby 


Deletion 


0.01 


0.92 


DA PM\/D/1C1 C 

rA_UIMVn4o Lb 


cnry.4obyy izb-4ob lb/ 1 / 


Deletion 


0.01 


0.94 


DA PM\/D/1 C1 in 


cnry.4ob lb/yb-4ob lb/ I / 


Deletion 


0.01 


0.94 


DA rM\/Dll 1 

rA_UIMVn I I. I 


Cnr I4.44zb l(Jo/-44zy4ozb 


Deletion 


0.00 


0.95 


DA ^M\/DQO/1 C 


cnrz.oyuzyzo i-y iU4b4ob 


Multiallelic 


0.00 


0.95 


DA PM\/DO -1 

rA_UIMVno. 1 


cnr i4.4Ubbu iyy-4U/bobyu 


Deletion 


0.00 


0.95 


DA f"*M\ /D~7 1 


chr14.jybUbJU1-Jybb0zoD 


Deletion 


0.00 


0.95 


rA_LNVnz4y.l 


chrlz.949obbU-9b1b/.3b 


Multiallelic 


0.00 


0.97 


DA f"*M\ /DC7 -1 

rA_LNVnb/.1 


chr4.1bzz3yb1J-1bzJ1Uz0b 


Deletion 


0.00 


0.97 


DA C 1 M\ /DO -1 


_ u r -i /i.oirii /ir\i /i o -i r\no7i r 

chr14.z1014014-z1U3o/1b 


Deletion 


0.00 


0.97 


DA f"*M\ /D -1 D -1 


chr14.1Ub1obz3o-1Ubzy /Ubl 


Multiallelic 


0.00 


0.97 


da rkn/Dnn 1 
rA_LNVH1 /U.I 


chrl /.yy3/bbo-1UJJ/ /la 


Duplication 


0.00 


0.97 


PA_CNVR182.1 


chr17:46849574-46910094 


Duplication 


0.00 


0.98 


PA_CNVR175.1 


chr17:21967881-22013983 


Deletion 


0.00 


0.99 


PA_CNVR71.1 


chr6:30854607-30864253 


Deletion 


0.00 


0.99 


PA_CNVR237.3 


chr1:195089923-195168376 


Multiallelic 


0.00 


0.99 


PA_CNVR276.4 


chr15:28714502-28881771 


Multiallelic 


0.00 


0.99 


PA_CNVR579.1 


chr22:23983992-24254444 


Multiallelic 


0.00 


1.00 


PA_CNVR210.1 


chr20:618801 57-62011862 


Deletion 


-1.09 


1.00 



"CNVR type is defined by the observation of either deletion genotypes only ("deletion"), duplication genotypes only ("duplication"), or both deletion and duplication 
genotypes ("multiallelic") among the analyzed individuals. 



sample-level by comparing CNVs from 10 duplicate pairs of 
samples that were genotyped twice on the same array platform 
but in two different batches (data not shown). Furthermore, the 
distribution of CNV call rates for samples genotyped on the 
Illumina Human CNV370-duo vs. -quad platforms were signif- 
icantly different. While batch effects may have contributed to 
this observation, we speculate that it also reflects differences in 
probe-level characteristics between the duo and quad bead array 
configurations. 

Following CNV discovery, we explored several hypotheses that 
CNV burden is a risk factor for pancreatic cancer. Different for- 
mulations of "CNV burden" have been employed in the literature, 
including: a simple CNV count, the total CNV length, and the 
total number of genes overlapped by a CNV. Here, we regarded 
CNV burden as the number of CNV calls made per individual — 
a metric that was evaluated across a spectrum of different CNV 
types and frequency, and different case-control subgroups. 

When we considered all 3520 CNVs discovered in our study 
across all 223 cases and 169 controls, we found no strong evidence 
of an association between CNV burden and pancreatic cancer 
risk. Similarly, no evidence was found when we restricted our 
analyses to cases with early-onset (age < 50) disease or with a 
family history of pancreatic cancer. Notably, a recent study by 
Stadler et al. found that a significant proportion of men with 
early-onset testicular cancer harbored de novo CNVs (Stadler 
et al., 2012). Although the pathogenicity of these specific de 
novo CNVs has yet to be confirmed, this finding suggested a 
novel framework for understanding the genetic basis of sporadic 
cancers. We explored this hypothesis by restricting our analyses to 



singleton CNVs (i.e., putative rare or de novo CNVs), but found 
no evidence of association. 

Although our results do not support the role of CNV bur- 
den in pancreatic cancer risk, we emphasize that our conclusions 
are tempered by the presence of experimental variability in the 
CNV discovery scheme. It remains possible that true differences 
in case/control CNV burden might have been masked by the 
presence of varying DNA sources, genotyping platforms, and 
experimental batch performance. 

We next attempted in silico validation of CNV loci that were 
previously implicated in familial pancreatic cancer risk. By com- 
paring our CNV discovery results against those reported by Lucito 
et al. and Al-Sukhni et al. we identified two loci in which the CNV 
events were present exclusively in our cases — and, thus, consis- 
tent with the original hypotheses that each locus confers a strong 
risk of pancreatic cancer. The first such locus, on chromosome 
9p24.2, was reported by Al-Sukhni et al. as a deletion event in 
a single case, but harbors no known RefSeq genes. The second 
locus, on chromosome 18p 11.31, was reported by Al-Sukhni as a 
duplication event in a single case and harbors four known genes 
including ARHGAP28, LINC00668, LAMA1, LRRC30. Notably, 
LAMA1 (laminin subunit alpha- 1 precursor) encodes a subunit 
of the extracellular protein laminin. While the specific role of 
laminin in pancreatic tumorgenesis remains poorly understood, 
a recent study by Vincent et al. found that LAMA1 was among 
several genes that were hypermethylated and underexpressed in 
pancreatic tumor samples compared to normal pancreas (Vincent 
et al., 2011). Our results add further support to the hypothesis 
that inherited duplications at the LAMA1 locus may be involved 
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in pancreatic cancer risk. However, as we did not experimen- 
tally validate the duplication involving LAMA1 in our data set, we 
cannot exclude the possibility of either false positives or negatives. 

Lastly, to determine whether specific CNV loci play a role 
in pancreatic cancer risk, we performed in silico copy-number 
genotyping and association testing across 176 CNVR loci identi- 
fied in our discovery experiment. Using a likelihood ratio testing 
approach, we identified seven regions putatively associated with 
pancreatic cancer risk at the nominal threshold ofp < 0.05. Only 
one locus (PA-CNVR46.1) reached the Bonferroni level of signif- 
icance. This locus maps to a non-genetic region of chromosome 
4q26 and was found to be multiallelic in our samples (i.e., both 
deletion and duplication genotypes were detected). However, its 
effect on risk was not significant in a multivariate logistic regres- 
sion model adjusted for gender, age, ancestry, and DNA source. 
Follow-up studies of these seven putative loci would be necessary 
to validate their associations with pancreatic cancer risk. 

Yet, prior to the conclusion of our study, work performed by 
Conrad et al. suggested that nearly 77% of all common CNVs 
in the human genome are tagged (r 2 > 0.8) by SNPs through 
linkage disequilibrium (Conrad et al, 2009). Hence, one could 
speculate that any common CNV locus with weak-to-moderate 
effects on pancreatic cancer risk would have been detected already 
through large-scale GWAS involving several thousand cases and 
controls (Amundadottir et al., 2009; Petersen et al., 2010). Thus, 
in this context, we strongly emphasize that our inability to detect 
such CNV loci is not unexpected given the relatively small sample 
size of our study. 

As a corollary, one could further hypothesize that CNVs with 
large contributions to pancreatic cancer risk are likely to be 
individually rare and poorly tagged by SNPs on commercially 
available genotyping arrays. Hence, we also emphasize that such 
CNV loci were likely to have been missed in our study due to 
not only small sample size, but also the resolution of the Illumina 
Human CNV370 bead array. 

In conclusion, based on the results of genome-wide CNV 
discovery in a hospital-based case-control cohort, our study 
found no evidence that CNVs contribute substantially to the 
genetic etiology of pancreatic cancer. However, in light of recent 
population-wide CNV data and the challenges faced by our study, 
future efforts to address the role of CNVs in pancreatic cancer will 
require larger case-control groups and high-resolution discovery 
platforms. 
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